This form is a web page which was created in MS WORD and therefore can be easily edited that way

University of Konstanz

UKN

VAST 2009 Challenge
Challenge 3 - Video Analysis

Authors and Affiliations:

Dr. Peter Bak, University of Konstanz, bak@dbvis.inf.uni-konstanz.de [PRIMARY contact]

Patrick Jungk, University of Konstanz, patrick.jungk@uni-konstanz.de [lead development, analyst]

Tool(s):

VAT – Video Analysing Tool

Developed at: University of Konstanz

by: Patrick Jungk

Version 1.0

Tool for

Movement Detection
Classification of Events
Pattern Detection
Visualisation of Pattern, Classified Events, Detected Movement

KNIME – Konstanz Information Miner

Developed at: University of Konstanz

by: KNIME CORE TEAM

Version 2.0.3

KNIME, pronounced [naim], is a modular data exploration platform that enables the user to visually create data flows (often referred to as pipelines), selectively execute some or all analysis steps, and later investigate the results through interactive views on data and models. […]

www.knime.org

Video:

VAST.wmv

ANSWERS:

Short Answer

Figure 1: Analysis process of video data

In order to sufficiently reduce the data for a successful information extraction the following basic steps are indispensable:

Movement Detection
Location Determination
Classification of moving areas
Automatic Classification Prediction
Filtering methods
Behaviour Pattern Detection

A base concept aims at determining events by detecting moving objects, surrounding them with bounding boxes for classifying them. Afterwards suspicious

behavioural patterns can be automatically detected and manually verified.

Figure 1: Detection of possible events requires data reduction (part below shows the determined patterns)

The retrieved data can be evaluated quickly by visualizing the relevant

patterns. The initial data can be reduced to less than 1%. Resulting in decreasing the expenditure of human labour regarding

long lasting videos with more than 1h.

MC3.1: Provide a tab-delimitated table containing the location, start time and duration of the events identified above. Please name the file Video.txt and place it in the same directory as your index.htm file. Please see the format required in the Task Descriptions.

Video.txt

MC3.2: Identify any events of potential counterintelligence/espionage interest in the video. Provide a Detailed Answer, including a description of any activities, and why the event is of interest.

Table of content:

1 Preliminary Considerations

2 Analysis

….2.3 Determination of bounding boxes

….2.4 Classification of Bounding Boxes

3 Determination of Suspicious Events

4 Result

….4.1 Suspicious Events

….4.2 Performance Comparison of Automatic and Interactive Parts

….4.3 Data Reduction

….4.4 Conclusion

List of figures

Figure 1: Classification needs interactive user involvement

Figure 2: KDD pipeline (Dr. Keim, 2006, presentation Infovis summer term 2006, University of Konstanz)

Figure 3: Analysis process of video data

Figure 4: Classification needs user interaction and computing using prediction algorithms

Figure 5: patterns can be verified manually and marked to be exported to a result table

List of Tables

Table 1: Mapped colours to the classified bounding boxes for visualisation

Table 2: comparison of user and hardware process times for video 1

Table 3: Data reduction of video one leads to relevant events

1 Preliminary Considerations

To identify any events of potential counter intelligence/espionage interest a definition of such an suspicious event needs to be given. Following events were defined as suspicious:

one person dropping an item, another person picking up an item
two persons meeting
one person waiting for another
a person is close to a car
two cars stopping next to each other

These events need to be describe in a formal way with behavioural pattern. To recognize such an event following dimensions are to be considered as well:

items
areas

Suspicious item are as followed:

humans
cars

Suspicious areas are as followed:

behind hiding objects (trees, etc.)
on the pavement
at parking areas
at the pavement/street border

Those areas specify the areas of interest. In order to determine events, items within an area have to be recognized. Therefore, areas of movement within the video need to be recognized since every moving object may be indicating suspicious deeds. Those areas of movement are to be marked and classified, see figure 1.

Figure 1: Classification needs interactive user involvement

Following types are considered as potential suspicious and need to be determined.

one human
one vehicle
two humans close to each other
one human being close to one vehicle

All other moving areas are not to be considered as suspicious. This means those areas are irrelevant and can be excluded.

2 Analysis

To analyse the video data, an interactive process based on the KDD (Knowledge Discovery in Databases) - pipeline (Fayyad, U., Piatetsky-Shapiro, G. and Smyth, P. From Data Mining to Knowledge Discovery: An Overview. Advances in Knowledge Discovery and Data Mining. (1996), 1-34.) was used as shown in figure 4.

Figure 2: KDD pipeline (Fayyad, U., Piatetsky-Shapiro, G. and Smyth, P. From Data Mining to Knowledge Discovery: An Overview. Advances in Knowledge Discovery and Data Mining. (1996), 1-34., http://www.aaai.org/aitopics/assets/PDF/AIMag17-03-2-article.pdf)

Following this terminology, a flow chart was created describing the operative steps required to conduct a successful analysis of video stream data.

Figure 3: Analysis process of video data

In order to extract the data from the video, thresholds have to be set manually by the user. The most important thresholds are as follows.

count of frames to skip, to calculate the difference between 2 frames
minimum size of a bounding box (area inside the bounding box) to suppress noise
threshold to recognize position change in pixels

2.1 Determination of bounding boxes

As the result of the determination chain the bounding boxes inside a frame are determined. Figure 1 shows the result of the automatic determination of bounding boxes. The colour bar below the frame preview shows the count of bounding boxes over the time, each line stands for one location, starting with the first one. The lighter the colour is, the more bounding boxes were found in this time slot. Each movement of the camera position indicates a change of location. This leads to the location information relevant for the result.

2.2 Classi fi cation of Bounding Boxes

The process of classification is an interactive process divided into two sub processes.

manual classification of a subset of bounding boxes (training)
automatic classification of the remaining bounding boxes using Neural Networks Algorithm (Multi Layer Perception Predictor) or Decision Tree Predictor

No.	name	color	R	G	B
1	human	green	77	157	74
2	Two humans	orange	255	127	0
3	car	red	228	26	28
4	Two cars	blue	126	126	184

Table 1: Mapped colours to the classified bounding boxes for visualisation

This trained data will be used for the next step (Multi Layer Perception Predictor, Decision Tree Predictor) as picture 9 shows. At the end of this step a table containing all bounding boxes of one sub-video results.

Figure 4: Classification needs user interaction and computing using prediction algorithms

3 Determination of Suspicious Events

Once the patterns have been recognized, the suspicious events can be reviewed by visualisation of the patterns.

Figure 5: patterns can be verified manually and marked to be exported to a result table

4 Result

4.1 Suspicious Events

As a result the most relevant pattern is

1 person meeting another

This means also, that two persons may walk down a street together (implicates previous meeting).

The pattern:

1 person near car

needs to be redefined for another run since too many events were found.

4.2 Performance Comparison of Automatic and Interactive Parts

Performance is assessed for the interactive and automatic parts of the process chain. Process times for the user as well as for the hardware (server,pc) are listed separately in the table 2.

No.	Process step	time in Min (user)	time in Min (HW)
1	Frame Extraction	0	180 - 360
2	Set up Thresholds	5 - 15	5 - 15
3	Determination of bounding boxes	0	180 - 240
4	Classifying of a subset of bounding boxes	5 - 15	5 - 15
5	Filtering Boxes 10	<1	<1
6	Visualisation	0	<1
7	Pattern Recognition	0	<1
8	Pattern Recognition Review	5 - 30	0

Table 2: comparison of user and hardware process times for video 1

4.3 Data Reduction

Table 3 shows the reduction of data for video 1. The

final relevant data was reduced to 0,008 % of the potential relevant data.

No.	Process step	count of table rows input	count of table rows output
1	Frame Extraction	0	0
2	Set up Thresholds	0	0
3	Determination of bounding boxes	0	143528
4	Classifying of a subset of bounding boxes	143528	143528
5	Filtering bounding boxes	143528	93865
6	Visualisation	93865	93865
7	Pattern Recognition	93865	3859
8	Pattern Recognition Review	3859	12

Table 3: Data reduction of video one leads to relevant events

4.4 Conclusion

Compared to the complete video time (4h) the user interaction takes between 25 to 70 minutes. The VAT-Tool enables an analyst to focus her/his attention on a limited amount of automatically preselected events, while it would otherwise be very difficult and exhausting to attentively watch the whole videos with several hours of duration.

University of Konstanz

VAST 2009 Challenge Challenge 3 - Video Analysis

Authors and Affiliations:

Tool(s):

VAST 2009 Challenge
Challenge 3 - Video Analysis